Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

نویسندگان

  • Bruno Scherrer
  • Matthieu Geist
چکیده

Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a parameterized policy space in order to maximize the associated value function averaged over some predefined distribution. It is probably commonly believed that the best one can hope in general from such an approach is to get a local optimum of this criterion. In this article, we show the following surprising result: any (approximate) local optimum enjoys a global performance guarantee. We compare this guarantee with the one that is satisfied by Direct Policy Iteration, an approximate dynamic programming algorithm that does some form of Policy Search: if the approximation error of Local Policy Search may generally be bigger (because local search requires to consider a space of stochastic policies), we argue that the concentrability coefficient that appears in the performance bound is much nicer. Finally, we discuss several practical and theoretical consequences of our analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search

Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a parameterized policy space in order to maximize the associated value function averaged over some predefined distribution. The best one can hope in general from such an approach is to get a local optimum of this criterion. The first contribution of this article is ...

متن کامل

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on Policy Search algorithms, that compute an approximately optimal policy by following the standard Policy Iteration (PI) scheme via an -approximate greedy operator (Kakade and Langford, 2002; Lazaric et al., 2010). We describe existing and a few new performance bounds for Direc...

متن کامل

Modify the linear search formula in the BFGS method to achieve global convergence.

<span style="color: #333333; font-family: Calibri, sans-serif; font-size: 13.3333px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: #ffffff; text-dec...

متن کامل

Global Optimality of Local Search for Low Rank Matrix Recovery

We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrixrecovery from incoherent linear measurements. With noisy measurements we show all local minima are very close to aglobal optimum. Together with a curvature bound at saddle points, this yields a polynomial time global convergenceguarantee for stochastic gradient descent ...

متن کامل

Constrained Nonlinear Optimal Control via a Hybrid BA-SD

The non-convex behavior presented by nonlinear systems limits the application of classical optimization techniques to solve optimal control problems for these kinds of systems. This paper proposes a hybrid algorithm, namely BA-SD, by combining Bee algorithm (BA) with steepest descent (SD) method for numerically solving nonlinear optimal control (NOC) problems. The proposed algorithm includes th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1306.1520  شماره 

صفحات  -

تاریخ انتشار 2013